-^^usoi 



M&G No. 10 



What is claimed is: 




-19- 



10 



15 



20 



A data management system, comprising: 

a first processor for restoring a plurality of received data files, the data 
fil\s being capable of being different file types; 

a file organizing/categorizing processor, coupled to the first processor, 
for organizing the received data files into data slices, each data slice including 
an identification number and a descriptor that describes characteristics of the 
received data file; 

a file\logging processor, coupled to the file organizing/categorizing 
processor, for Wging the received data files into a first database based on the 
data slices; 

a data uploVding processor, coupled to the file logging processor, for 
uploading the first database to a second database; 

a de-duplicate processor, coupled to the data uploading processor, for 
calculating a SHA valueW the received data files to determine whether the 
received data files have duplicates and flagging duplicated data files in the 
second database; 

an image conversion processor, coupled to the de-duplicate processor, 
for converting at least a portion of the received data files into image files; and 

a second processor, coupleayto the image conversion processor, for 
exporting the image files. 
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2. The system of claim 1, wherein the first database is a local database for at least 
one data slice, and the second database is a global database for all logged data slices. 

3. \ The system of claim 1, wherein the image files converted from the data files 
are in a standardized image format. 



including Microsoft Mail, Outlook, Group Wise, Lotus Notes, the user data files have 
a variety of formats including Word, Excel, PowerPoint, and Access. 



is associated with the data file such that image files for the data file and the 
corresponding attachment data file are viewed together. 

6. The system of claim 1, wherein the file logging processor, the image 
conversion processoA and the second processor are parallel processors such that the 
data files are parallel-processed in a data file logging stage, an image conversion 
stage, and an image fileWtput stage. 

7. The system of claim 1, wherein the data files having the same file type are 
converted into the image files together. 



4. 



Le system of claim 1, wherein the data files are in a variety of formats 



5. 



The system of claim 1, wherein an attachment data file in one of the data files 



8. The system of claim 1, wherein the data management system includes a 
plurality of image conversion processors, each of the image conversion processors 
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being capable of converting the data files having the same file type into the 
corresponding image files. 

9. \The system of claim 1, wherein the file logging processor identifies the file 
type of the data files based on the SHA value and a file header of each of the data 
files. \ 

10. The system of claim 1 , further comprising a keyword search processor, 
coupled to the file logging processor, for searching a keyword from the received data 
files, wherein if there is a hit, the corresponding data file is retained for processing, 
and the data file without a hit is discarded without being processed. 

1 1 . The system of\elaim 1 , further comprising a keyword search processor, 
coupled to the image conversion processor, for searching a keyword from the image 
files, wherein if there is aVit, the corresponding image file is exported, and the image 
file without a hit is not exported. 

12. The system of claim 1, farther comprising a file status filter to indicate 
different statuses of the received data files. 



13. The system of claim 12, wherein the different statuses comprise New, In- 
Progress, Done, Error, Corrupted, Encrypted, No Keyword Hit, Big File, Large Page 
Count. \ 
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\4. A data management method, comprising the steps of: 

restoring a plurality of received data files, the data files being capable of being 
different file types; 

organizing/categorizing the received data files into data slices, each data slice 
including an identification number and a descriptor that describes characteristics of 
the received data file; 

logging the received data files into a first database based on the data slices; 
uploaking the first database to a second database; 

de-duplicating duplicates in the received data files by calculating a SHA value 
of the received data files to determine whether the received data files have duplicates 
and flagging duplicated data files in the database; 

converting \t least a portion of the received data files into image files, 
respectively; and 

exporting the itoiage files. 

15. The method of claim 14, further comprising the step of viewing the image files 
stored in the second database. 



16. The method of claim 14, wherein the step of converting of the data files 
comprises the step of convertingvthe data files into a standardized image format. 



17. The method of claim 14, further comprising the step of searching a keyword 
from the received data files, if there is\a hit, the corresponding data file is retained for 
processing, and the data file without a hk is discarded without being processed. 
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18. \The method of claim 14, further comprising the step of searching a keyword 
from the image files, if there is a hit, the corresponding image file is exported, and the 
image filb without a hit is not exported. 



